The disclosure of diagnosis codes can breach research participants' privacy

نویسندگان

  • Grigorios Loukides
  • Joshua C. Denny
  • Bradley Malin
چکیده

OBJECTIVE De-identified clinical data in standardized form (eg, diagnosis codes), derived from electronic medical records, are increasingly combined with research data (eg, DNA sequences) and disseminated to enable scientific investigations. This study examines whether released data can be linked with identified clinical records that are accessible via various resources to jeopardize patients' anonymity, and the ability of popular privacy protection methodologies to prevent such an attack. DESIGN The study experimentally evaluates the re-identification risk of a de-identified sample of Vanderbilt's patient records involved in a genome-wide association study. It also measures the level of protection from re-identification, and data utility, provided by suppression and generalization. MEASUREMENT Privacy protection is quantified using the probability of re-identifying a patient in a larger population through diagnosis codes. Data utility is measured at a dataset level, using the percentage of retained information, as well as its description, and at a patient level, using two metrics based on the difference between the distribution of Internal Classification of Disease (ICD) version 9 codes before and after applying privacy protection. RESULTS More than 96% of 2800 patients' records are shown to be uniquely identified by their diagnosis codes with respect to a population of 1.2 million patients. Generalization is shown to reduce further the percentage of de-identified records by less than 2%, and over 99% of the three-digit ICD-9 codes need to be suppressed to prevent re-identification. CONCLUSIONS Popular privacy protection methods are inadequate to deliver a sufficiently protected and useful result when sharing data derived from complex clinical systems. The development of alternative privacy protection models is thus required.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing Tools and Algorithms for Privacy Protection and Data Security in Social Networks

The purpose of this research, is to study factors influencing privacy concerns about data security and protection on social network sites and its’ influence on self-disclosure. 100 articles about privacy protection, data security, information disclosure and Information leakage on social networks were studied. Models and algorithms types and their repetition in articles have been distinguished a...

متن کامل

Anonymization and De-anonymization of Social Network Data

Adversary: Somebody who, whether intentionally or not, reveals sensitive, private information Adversarial model: Formal description of the unique characteristics of a particular adversary Attribute disclosure: A privacy breach wherein some descriptive attribute of somebody is revealed Identity disclosure: A privacy breach in which a presumably anonymous person is in fact identifiable k-P-anonym...

متن کامل

Big Data in the Campus Landscape: Security and Privacy

Privacy is a simple term for two complicated concepts. At its most basic layer, privacy is always about people and their control of their personal information. Information privacy protects individuals by protecting information about them from unauthorized disclosure (think compliance with FERPA, HIPAA, IRB regulations, or state breach notification laws). Autonomy privacy protects individuals by...

متن کامل

Examining Privacy Regulatory Frameworks in Canada in the Context of HIV

In the process of receiving perinatal care, women living with HIV (WLWH) in Canada have experienced disclosure of their HIV status without their express consent. This disclosure often occurs by well-intentioned healthcare providers; however, from the perspective of WLWH, it is a breach of confidentiality and leaves WLWH to manage the consequences. This paper is a critical review of the regulato...

متن کامل

Data Mining as a Tool in Privacy-preserving Data Publishing

Many databases contain data about individuals that are valuable for research, marketing, and decision making. Sharing or publishing data about individuals is however prone to privacy attacks, breaches, and disclosures. The concern here is about individuals’ privacy—keeping the sensitive information about individuals private to them. Data mining in this setting has been shown to be a powerful to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 17 3  شماره 

صفحات  -

تاریخ انتشار 2010